Confidence metrics based on n-gram language model backoff behaviors

نویسندگان

  • C. Uhrik
  • W. Ward
چکیده

We report results from using language model confidence measures based on the degree of backoff used in a trigram language model. Both utterance-level and wordlevel confidence metrics proved useful for a dialog manager to identify out-of-domain utterances. The metric assigns successively lower confidence as the language model estimate is backed off to a bigram or unigram. It also bases its estimates on sequences of backoff degree. Experimental results with utterances from the domain of medical records management showed that the distributions of the confidence metric for in-domain and out-of-domain utterances are separated. Use of the corresponding word-level confidence metric shows similar encouraging results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Trigram Backoff Language Models

When a trigram backoff language model is created from a large body of text, trigrams and bigrams that occur few times in the training text are often excluded from the model in order to decrease the model size. Generally, the elimination of n-grams with very low counts is believed to not significantly affect model performance. This project investigates the degradation of a trigram backoff model’...

متن کامل

Scalable backoff language models

When a trigram backoff language model is created from a large body of text, trigrams and bigrams that occur few times in the training text are often excluded from the model in order to decrease the model size. Generally, the elimination of n-grams with very low counts is believed to not significantly affect model performance. This project investigates the degradation of a trigram backoff model’...

متن کامل

Fast Neural Network Language Model Lookups at N-Gram Speeds

Feed forward Neural Network Language Models (NNLM) have shown consistent gains over backoff word n-gram models in a variety of tasks. However, backoff n-gram models still remain dominant in applications with real time decoding requirements as word probabilities can be computed orders of magnitude faster than the NNLM. In this paper, we present a combination of techniques that allows us to speed...

متن کامل

Distribution-Based Pruning of Backoff Language Models

We propose a distribution-based pruning of n-gram backoff language models. Instead of the conventional approach of pruning n-grams that are infrequent in training data, we prune n-grams that are likely to be infrequent in a new document. Our method is based on the n-gram distribution i.e. the probability that an n-gram occurs in a new document. Experimental results show that our method performe...

متن کامل

Hierarchical class n-gram language models: towards better estimation of unseen events in speech recognition

In this paper, we show how a multi-level class hierarchy can be used to better estimate the likelihood of an unseen event. In classical backoff n-gram models, the (n-1)-gram model is used to estimate the probability of an unseen n-gram. In the approach we propose, we use a class hierarchy to define an appropriate context which is more general than the unseen n-gram but more specific than the (n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997